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(54) Method for motion estimation 

(57) Motion estimation (ME) is the process of deter- 
mining the movements of objects between successive 
image frames of a video sequence. Motion-compensat- 
ed (MC) image processing is the processing of images- 
to account for the presence of motion. The present in- 
vention provides an efficient, accurate technique for mo- 
tion estimation that Is compatible with an MPEG video 
encoder and that overcomes the computational com- 



plexity of computing motion vectors using the mean 
square error matching criterion by reducing the compu- 
tation to that of a multidimensional truncated convolu- 
tion and a running sum of squares. Making use of fast 
convolution procedures, this method obtains the optimal 
motion estimate with worst-case run time comparable 
to the average run time of statistical speedup methods 
and to the run time achieved by sparse-search methods. 
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Description 

The present invention relates to data compression and, more particularly, to the compression of digital video im- 
ages. 

s The advent of digital representations of visual information made possible the machine manipulation of visual in- 

formation. Visual communications is now a rapidly growing field spanning the telecommunications, computer, and 
media industries. The progress in this field is supported by the widespread availability and expanding capacities of 
digital transmission channels and digital storage media. Communication-based applications include ISDN videophone, 
videoconferencing systems, video mail, digital broadcast television, high-definition television (HDTV), and program 

10 delivery on cable and satellite. Storage-based audiovisual applications include digital laserdiscs, electronic cameras, 
training, education, and entertainment. Interactive multimedia applications on personal computers, worlcstations, and 
the ubiquitous world wide web are here to stay 

Low cost is desirable for the successful introduction of new communication sen/ices. Unlike the digital audio tech- 
nology of the 1 980's, however, many of the applications of digital video hinge on the use of data compression to reduce 

IS transmission and storage requirements. Many techniques have evolved for representing (coding) visual information 
by a finite sequence of integers suitable tor digital processing. Corresponding to each coding procedure is a method 
of decoding or reconstructing the desired visual information from the coded data. 

The performance of the coder-decoder is a measure of how well the reconstructed visual information achieves the 
goals of the system. Designers of imageA^ideo processing systems explore not only new coding techniques, but also 

20 new methods of subjective and objective evaluation. Another aspect of system performance is its efficiency. The aim 
of most coding methods is to produce as little digital data as possible while at the same time accomplishing the given 
overall goals of the system. 

To facilitate industry growth and world wide interchange of digitally encoded audiovisual data, there is a demand 
for international standards for the coding methods and transmission formats. International standardization committees 

25 have completed three digital standards. The Joint Photographic Expert Group (JPEG) of the International Standards 
Organization (I SO) has specified a standard for still picture compression. The ITU (formerly the Consultative Committee 
on International Telephony and Telegraphy, or CCITT) proposed Recommendation H.261 for video telephony and vid- 
eoconferencing (CCITT Study Group XV, 1990). The Moving Pictures Expert Group (MPEG) of ISO has completed 
MPEG-1 (MPEG-1, 1991) for compression of full-motion video on digital storage media. Further, it has completed 

30 MPEG-2 (MPEG-2), a family of standards with different profiles and levels that will provide broadcast-quality TV over 
2 to 15 Mbits/sec channels. 

Successive image frames in a video sequence can differ because of object motion, camera motion, panning, 
zooming, ©fa Estimating the relative motion between image frames addresses the registration problem, which involves 
the spatial alignment of a pair of views of a scene. This problem has long been of interest in such areas as video 
3S compression, robot vision, biomedical engineering, target detection from radar images, and remote sensing from sat- 
ellite images. Motion estimation (ME) is the process of determining the movement of objects within a sequence of 
image frames. Motion-compensated (MC) image processing is the processing of images to account for the presence 
of motion. 

Among the various applications for mot ion -com pen sated image processing are image interpolation, image resto- 
re ration, and video coding. Accurate motion models allow for a compact description of moving imagery, and motion 

prediction permits high compression. Thus, it is desirable to find efficient, accurate techniques for motion estimation 

that are compatible with an MP EG -compliant video encoder. 

Exhaustive search for the minimum mean-square error (MSE) is a common blockmatching method for obtaining 

the optimal motion estimate over a given search area. Since exhaustive sequential search may be computationally 
45 unfeasible, many "fast" methods for 'improving' video compression performance are advocated. Among these are 

simplified matching criteria, faster search strategies, and more flexible segmented-block motion fields. Generally, the 

methods known in the art provide either a statistical speedup when performing exhaustive sequential search, or deploy 

a sparse search strategy that reduces computation at the expense of accuracy. 

so Image Coding 

Compression of visual data Is possible because of redundant and/or Irrelevant Information In the video signal. Any 
information that can be extracted using the statistical dependence between the pixels is redundant and can be removed. 
Any information below a specific quality threshold is not relevant to the receiver and need not be transmitted. In the 
55 case of a human observer, the visual properties of the human eye distinguish the relevant from the irrelevant. 

Image and video compression techniques exploit correlation In space for individual still image frames and in both 
space and time for successive video frames. Redundancy in the data that may exist due to the predictable nature of 
images can be removed by applying a linear transformation to create decorrelated outputs from a correlated input 
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stream. In addition, vector quantization can be employed to achieve further packing gain (even on independent vari- 
ables), remove nonlinear as well as linear dependencies, and allow coding at fractional bit rates 

Lossy methods, in which the reconstructed data are hot Identical to the original, generally exploit nonlinear aspects 
of the human visual system. Irrelevant information can be eliminated if the data can be transformed into another domain 
where the image information can be rearranged according to its subjective order of importance. The less important 
information is simply discarded by coarser quantization. In other words, a transform matched to human perception will 
enable the lossy quantizer to preferentially dispose of Irrelevant infomfiation. The energy-packing property of unitary 
transforms such as the discrete-time Karhunen-Lo6ve transform and the discrete cosine transform is useful for this 
subjective ordering of the information content of visual signals. 
10 Although the primary objective in most digital image compression systems is a high fidelity reconstruction of the 

original signal with as few bits as possible, additional system requirements may also be imposed. For instance if the 
application demands multi-resolution scalability, then linear transforms with the successive refinement property such 
as wavelets and pyramid coding, are more suitable for achieving the overall goals of the system 

Coding a single frame of a video sequence exploits only the spatial correlation in the image and is called intraf rame 
coding. Time-varying images can be transmitted more efficiently by exploiting the similarities between successive 
frames of a video signal. The use of temporal correlation to code a sequence of video frames is called interframe 
coding, Intertrame coding requires storage of the frames used in the coding process. Three-dimensional transform 
coding over A/ frames in the temporal dimension involves an inherent delay, since N- 1 previous frames are needed 
simultaneously in coding the current frame. Excessive delay and prohibitive storage requirements may limit to two or 
three the number of frames that can be effectively used. 

When restricted to using such a small number of frames, the advantage of transform coding along the temporal 
dimension over wavefomi coding in terms of correlation reduction and energy compaction diminishes significantly. In 
this case, a hybrid transform/waveform coding approach is generally better In the simplest form of hybrid coding a 
two-dimensional transform is computed for each frame, and then waveform coding such as differential pulse code 
modulation (DPCM) Is applied along the temporal dimension. 

Conditional replenishment is a simple extension of DPCM that codes the difference between the current frame 
and the previously coded frame, but transmits this difference (which is also called the residual error or residue) only if 
It exceeds some threshold. The idea behind this extension is that the residue is typically very small except in the regions 
of the image where there is motion. At the receiver, a pixel is reconstructed by either repeating the value of the corre- 
sponding pixel location from the previous frame when the reskJue is below the threshold or by replenishing the previous 
pixel location with the decoded difference signal when the residue is above the threshold. A buffer with an appropriate 
g)uffer-control strategy is needed to smooth out the higher than average data rates in frames with large motion and the 
lower than average data rates in frames with little motion. 

Another possible approach for reducing the video bit rate is to simply discard some frames and then reconstruct 
them from the coded frames at the decoder using temporal interpolation. Unlike methods for image reconstruction 
however, any method for image coding must also address the additional fundamental constraints of bit transfer rate' 
image quality and computational complexity. For instance, discarding every other frame will not necessarily result \n 
a bit rate reduction by a factor of two. since discarding a frame is likely to reduce the correlation between the two 
. consecutive frames that are coded, causing the error signal to be higher in magnitude. That is. for the same threshold 
w more pixels will have to be replenished. 

To overcome such limitations, motion estimation and motion compensation may be applied to video compression 
and predictive coding. Rather than applying temporal interpolation to improve the performance of conditional replen- 
ishnrient. better results are obtained by using motion-compensated prediction to predict and code the current frame 
based on the previousfy coded frame. The residual error, which is then called the displaced frame difference (DFD) 
IS the difference between the current pixel and a displaced pixel in the previous frame. If the displacement estimate is 
accurate enough, then the error obtained after motion compensation will have a decreased magnitude and as a result 
fewer pixels will need to be updated at a given threshold than would be required with conditional replenishment The 
quality of motion-compensated prediction depends greatly on the accuracy and robustness of the motion estimator 
The estimated displacement offset or motion vector Is not limited to integer values but can be interpolated to sub-pixel 
accuracy. ^ 
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MPEG 



A color TV picture requires a data transmission rate of 166 Mbits/sec. A typical color photograph requires about 
3 Mbytes of storage. Compression is required to achieve these data rates and manage the storage requirements 
Lossless compression techniques, however, can achieve at most a compression of 3 to 1 . Methods that achieve high 
compression rates (such as 10:1 to 50:1 for images and 50:1 to 200:1 for video) require lossy techniques 

One lossy compression standard called MPEG is a representation/display standard designed to compress video 
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sequences by up to 50: 1 . MPEG-1 was designed to connpress non-interlaced SIF (standard Interchange format) video 
at a data rate of 3.8 Mbytes/second to 1 .5 Mblts/second or less for VHS-quality reproduction. MPEG-2 was designed 
to transmit broadcast quality interlaced TV video over 2 to 1 5 Mbits/second channels. MPEG is tailored to asymmetric 
applications such as electronic publishing, digital libraries, games, and program delivery that require frequent use of 
5 the decompression process and an inexpensive decoder for consumers. 

The MPEG standard specifies the coded representation of picture information for digital storage media and digital 
video communication and specifies the decoding process. The MPEG encoder is said to be hybrid because it combines 
transform coding with predictive interframe coding, in which a block in the current frame is predicted from a block in 
the previous frame using a feedback loop. In order to have the same prediction at both the receiver and the transmitter, 
10 the decoder must always be incorporated into the encoder. In contrast, the JPEG standard for still pictures operates 
in an open loop that is reset at the end of each image. A block diagram of a video encoder and decoder with MG 
(motion-compensated) prediction, DOT (discrete cosine transform) coding of the prediction error, variable length cod- 
ing, and quantization control by the buffer content is shown in Figure 1, which specifies the basic structure of the H. 
261 MPEG-1 and MPEG-2 encoder/decoder. The DOT coefficients are quantized until enough can be discarded to 
IS satisfy the bit rate requirements, and then lossless entropy coding is applied to the small number of remaining transform 
coefficients. Aside from the subsampling of the chroma components and rounding errors in the 2-D transform, quan- 
tization is the only lossy part of the compression scheme. 

MPEG uses motion-compensated predictive DOT coding on some frames, and bi-directional mot ion -compensated 
interpolation on the remaining frames. The video sequence is segmented into blocks of frames, which are each called 
20 a group of pictures (GOP). The first frame In a GOP is called the intraframe coded image or l-frame and is coded using 
no prediction. The l-frame is used as the initial frame of a motion compensation loop that predicts every Nth frame in 
the GOP, where N is typically 2 or 3. The predicted frames are called P-f rames. The N-^ skipped frames are interpolated 
along the motion trajectories using the nearest P-frames and/or l-frame as the reference (or anchor) frames. These 
skipped frames are called B-f rames because the interpolation is bi-directional (noncausal). Both the l-frame pixels and 
25 the various prediction errors (the displaced frame differences for each P-frame and B-f rame) are compressed using a 
JPEG-like standard involving the DCT, quantization, and entropy coding. 

Intraframe coding techniques can provide 6:1 data reduction. The l-frame to l-frame distance (i.e., the number of 
frames in the GOP) is about 1 2 to 1 5, which provides a random access point every half second of video for fast-f onward/ 
fast-reverse capability. Motion compensation can improve the compression ratios by an additional factor of 10. P- 
30 frames, coded with MC prediction from a single reference frame, provide 10:1 to 20:1 compression, and B-f rames, 
coded using bi-directional MC interpolation from both past and future reference frames, can achieve 120:1 compres- 
sion. 

Increasing the number of B-f rames between successive reference frames generally decreases the correlation 
between the B-f rames and the reference frames as well as the correlation between the references themselves so that 

35 the quality of the resulting interpolation decreases. B-frames, however, still contribute significantly to the overall video 
quality due in part to their ability to predict areas that are uncovered by moving objects. In addition, B-f rames also help 
to limit the propagation of transmission errors since they are not themselves used for the prediction of subsequent 
frames. On the other hand, B-f rames increase delay and picture buffer size. 

The MPEG standard specifies only the coded representation (the coded bitstream syntax) of video data and the 

40 decoding process required to reconstruct the image sequence. In other words, a decoder is an MPEG decoder if it 
decodes an MPEG bitstream to a result that is within acceptable bounds as specified by the decoding process. Although 
it does not define the encoding process, the standard does define the form and content of the data that the encoder 
is allowed to transmit. Thus, an encoder is an MPEG encoder if It can produce a legal MPEG bitstream. 

The designation "MPEG compatible" is not qualitative. Two decoders with different implementations may be the- 

45 oretically equivalent in the sense that they produce the same output (as defined by the decoding process of the MPEG 
standard) when presented with the same input. On the other hand, two encoders with different implementations could 
produce different (yet legal) outputs when presented with the same source material. One of the two. however, may 
have a better fidelity in the decoded sequence or a lower cost of Implementation. Integrated circuit manufacturers can 
distinguish their encoders through the consistency of their bit rate control, the adaptability of their quantization levels, 

60 the effectiveness of their preprocessing to minimize coding artifacts, and the accuracy and speed of their motion esti- 
mation method. For example, while the encoder has the responsibility to estimate motion vectors, the MPEG standard 
does not dictate how that estimation Is to be accomplished. 

Motion estimation is performed on the luminance component of a given macroblock (16 by 16 block of pixels, as 
specified by the MPEG standard) and the resulting motion vector is applied to the chroma channels as well. Within the 

55 framework of the MPEG standard, the motion estimation problem Is to select a motion vector for the current macroblock 
that minimizes some cost function of the prediction error between the current block and each predictor candidate. Such 
details as the cost function, the search range, and whether the original or the reconstructed frame is used as the 
reference are left to the implementor. 
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Schemes for Motion Estimation 

ini.n^LlTZ^''^ *° ^"'"P^"^^"^" (MC) begins by estimating a motion field at the encoder based on pixel 

.ntensrt.es ,n the current and previous frames. A basic problem of MC is that this motion field estimate must be recon 

a trade-off between decoder complexity and bit transfer rate. r- » y ^aas lo 

T^J^^lnZ'T ^^^""^^ °^ characterized by how information about this motion field is obtained at the decoder 

Two standard approaches are recursive backward MC and region-based forward MC. Recursive backward methods 

v2Sr o'br'°" ;!,'?K '"""^ ""'"^ °" ^^-^^^ do not reql motion 

the dlJ^i f h"" I region-based forward methods, the encoder transmits motiS, vecto°s to 

the decoder. To decrease the computational burden placed on the decoder, the MPEG standard uses forward MC 
unon ?hn!^ °" estimation then Is to find displacement offsets for the pixel intensities of the current frame based 

^s^hl ,h J .h ''^''T ^ ^ P^«^'°"='y decoded-frame. In particular, it is 

^r!S,V t r"'""' '"^""^ ""^^ ''^ ^"^^ '^^^^^^^ '^^"^^ ^'^P'^y For this reason, we w H 

hereafter refer to the previous frame as the reference frame. The predicted frame is the approximation to the cLrr^ 

rolcelT '^°'}°" --P«-ation to the reference frame or frames, before "rTecrg fofany 

frtlt fnH rT ^'^"'^^^ °' ''^ P^^^-'^^^d from past l-frames or past P 

rL^rencrffamer^^ -nterpolated from a past and a future Wrame and/or P-frame. B-frames are never used as 

Blockmatching Motion Estimation 

for estimation consider a small region in the current frame to be predicted and search 

for a displacernent that produces a "best match" among the candidate regions in the reference fr^e In particufar S 
motion vector d = (cf, dy) is chosen so that the prediction error PE(d). which is given by particular, the 



rTontf ' Z ^'ff""'l"°'" '° ^'^^'^ 2) f^x.y) denotes the luminance component of a pixel at location (xy) in 
the cur en frame 24, denotes the luminance component of a pixel at location in the reference frame 25 

V^denotes the set of pixel locations in the current block 31 , and D is a mismatch function that quantifieMhe dlss^miLfif^ 
be weeri a p«el ,n the current block 31 and in each candidate block 27. The pivot 30 of the current btoc^k 3 TsThe too 
he samr'^r T f ""'^ —displacement pivot (ZDP) 29 is the pivot in the reference '^e i th^^^^^ 
the same pixel coordinates as the pivot 30 of the current block 31 in the current frame 24 A blockmatchina motion 

t^alTT) l dTfnsS^h"^*'^r^^" ^^^'"^'^^^ ^^^^^ dlspLtr^e^tclor d : ^h 

l«trih*^° H ' ^ ""^ ^^^''^'^ '^9'°" 28 and (xo.yo) is the zero displacement pivot 29 Note that the 

search window 26 contains each possible candidate block 27 whereas the search region 28 contains only Se pivo o^ 
each possible candidate block 27. Any M x N block of pixels in the reference frame 25 whose piiHes witShe 
search region 28 around the ZDP 29 is a potential candidate block 27. 

Amongthe many possible chofces for the mismatch function D, the mean absolute error (MAE) is the overwhelmino 
^vorrte due to rts relative ease of hardware implementation, even though i. suffers from poor per^ln^ce MoT^c 
curate crrtena such as the cross-correlation function and the mean square error (MSE) are often SLed n The 
literature as too complex for real-time hardware implementatton oismissea in the 

is to^!!^^z^tl!l7^T:!f^.!.°' registering a. aligning) a pair of images that differ by unknown translational motion 
L J^mT^r h ^ ''"^^ correlation between the pair of images. When applied to the problem of motion 



ccF(d)= Z/.<^^^yV,i^-^d,,y^dy 



U^^^if^^^Tn V° ^'^''^ ''^""'^^ luminance component of a pixel at location (x,y) in the current 

ZI J ; ^ T luminance component of a pixel at location- (x.y) in the reference frame 25. ^denotes 

the set of pixel locations in the current block 31 . and the variable d must be such that (x^y,) . d is inside S wS^S I 
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independent of relative intensity, one could instead choose a displacement d = (d^dy) that yields the maximum ot 
normalized cross-correlation (MNCC) function given by 



MNCC(d)= ^"^'^ — 



Note that since the norm of f, in the denominator is not a function of d. maximizing the normalized cross-correlation 
function is equivalent to maximizing 

u,/>^ __= 



(jtj')«»' 



The peak-to-peak signal-to-nolse ratio (PSNR) is a generally accepted measure of image fidelity so another pos- 
sible approach is to maximize (by choice of d) the PSNR given by 



PSNR(d) = 20log 



255 



where 255 is the signal intensity range for 8-bit pixel values and MSE(d) is the mean square error given by 

Mrf 



Note that Choosing d to maximize the PSNR is equivalent to choosing d to minimize the MSE. We will refer to such a 

an MSSD techn que The primary jusLation for sacrificing perfomiance is to reduce computational complexrty The 
meanabsofutre?^^ 

^^he MctarMS^^^^^^ both of which require multiplication. In particular, a MAE motion estimate is a 

displacement vector d that minimizes ttie MAE given by 

We will refer to such a method as a minimum sum of absolute differences (MSAD) technique^ 

Sin e rhi prsslinity of computing the MSE is generally dismissed by P-^'^;°"-^^^^^^^^ 
MAE has replaced the MSE as the baseline for reporting performance compansons in the literature Tha is the p^^^^ 
fofr^ance inl^s of the PSNR of newer techniques is usually compared to full-search MAE. "«"«"y J'^ 
resXit iLpos^ble to outperform full-search MAE on certain video sequences, but without a comparison to full-search 
mIe It is difficult to detenlnine how much perfonriance in temis of the PSNR is being sacrificed. 
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r.^..^»^'^'''^?'^^r°^^^^lV^^ °^ P'^^'^ ^•^^ *° a best match is called a minimum 

pixel difference classification (MPDC) method. In such a method, each pixel <s classified as either a matching p"xel^ 

orrrr^^r'n'^H' t7^:"' """^ *° ^ P'^^^'^ »° -^^t-^f^ ^^elr MAE is below the 

piTelfZ mth^^'S H H ^"9ion is the block with the largest number of matching 
pixels This method has been shown to outperform the MSAD method while reducing the computational comDlexitv 
In fact us^g the PSNR of the motion compensated frame as the performance measure, the MPDC methoS^s been 
IL r ^ P^^"^-^^"^ to ^hat of the MSSD method and significantly better than that of the MSAD method 
tt J^H . '!^'°" performance between MSAD and MPDC becomes wider. Further, when 

Sfv Tiori T'"^ sequence the PSNR for MSSD and MPDC continues to improve yet declines for MSAD. 
obl^cfwHH ^ search region should increase the ability of a motion compensated predictor to track a quickly moving 
rhfnl I Mpln°"^ displacements. This conflicting behavior is a clear indication of the suboptimality o1 

aZ Lin t« ""IT"^- P"^"*^'" »° '^S'^D ""^'^^'^ ^«'«t've ease of implemen- 

tation despite the problems of the method, particularly when faced with quickly moving objects. 

Search Schemes and Computational Complexity 

A full-search blockmatching algorithm (FSBMA) evaluates the prediction error for each pivot within a qiven search 
range and selects a pivot with the smallest error. Efficient sequential search strategies can^rovide a sigi^^Jinrste 
Sm nf sparse-search blockmatching algorithms (SSBMA) use some 

SZr LTh^" ^"''^^^""S- blocks Within the search window are not considered to be candidates 
t?e milSunctl^^^ ^"''^'"P""^ »° ^^'^'^^^ ^ computations involved in evaluating 

fcr I°Ivh!r T^'^f T^'^^'^t •''^"'^ ^' °* ^^^^'^'^^^ C-^ ' «va'"attons of the predictk,n error) required 

2S u« M . ' k'^^'''^ ^ "^-^'^^^ 26 of pixels in the reference frame 25 is given by (^l^ - M 

nTsSrars^^i^^^ 

s.«r!I"LnH^HT "° be made until the mismatch function has been computed for all possible 

30 df^n Tf- °' ^^-^P^tation is the same for all degrees of misregistrmion (I.e.. uL\a^^u 

f uncZL?^ ? 7"'' ''^ ^" "'^'"'""^ ^«^^<=b in Which the evaluation of the mismatch 

bes?LTf«r ^ T^T^ displacement is temiinated whenever the accumulated prediction error exceeds the 

best-so-far minimum, at which point the search continues with the next candidate displacement 

I frampt "^^^^ ^'^*' °^ P^«=«ss'"9 t'T'e for the standard pattern in which 

3S separated by two B-frames. The early termination approach described above results in a 

50 /o speedup. Note, however, that this .s a statistical speedup. That is. the sooner a good estimate is found with its 

correspondingly low cost, the less computation is required to discard the remaining pcSr estimates 
, ,K be obtained by sequential search methods is largely dependent upon the success 

sU^h cfn bTJernaTp ""'""'^ '""-'^ -timate the true dirplacemen't. theLs'rthe 

40 a^u'd thT JT ? J ^'"''^ ^""^"^^ "^^^^ 3 spi^a' search centered 

aZ^nli 1 nl ^'^P'^"'"^"^ P°'"' ^ 30O/O speedup over a rectangular (raster^can order) search. Further, an 

TK «LL^ ^^'^ particularly true when the motion is caused by a camera panning a scene 

Fffirin! «'"s^'d«^«d above have each used an exhaustive search of all pivots within the search region 

45 re^T hTr^ ^ r''^ ^^^'y termination and spiral search provide a statistical speedup that may 

reduce the computational requirements to a manageable level when averaged over a tong video sequence Still for 
any given macroblock. there is no guarantee that the best motion vector will be located in less time than it takes for 

ron«t f ''J^'^^*°^'"9 algorrthms (SSBMA) attempt to provide a nearly fixed speedup in computation by 

considering only a subset of possible offsets within the search region. Each of these methods ^s based on an Lumptio^ 
tha the rnismatch function (typically MAE) increases monotonicaliy as the candidate offset moves away f romThe 
biZlr:;7«?lT"- T^T: '"^^'^^ -'"''"^ the search can 

^^fhu™ / ^ 1 ''^"^^ °' ^''''"'^^y- '•'^ "'^t^^^^ '° P^""e tbe search space are heuristk: 

and thus may naturally be expected to experience difficulties in certain cases. 

SUMMARY OF THE INVENTION 

The present invention utilizes a blockmatching motion estimation method using mean-square error (MSE) as the 
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matchina criterion IVIaking use of fast methods for multidimensional convolutions, a new method for computing motion 

TCI^pr^ed motion e^^^^^^ in a worst-case run time may be obtained that is comparable to the average run 
^e ofTaSl^arspTedup^^^^^^ and to the run time achieved by sparse search methods that sacnfce estimation 

^''''"S particularly in one embodiment of the invention a method of obtaining motion estimation vectors for a block 
baseTv^Seo XressTon scheme is disclosed. The method may Include partitioning the curjBnt frame and a ^ererjce 
Sme J^to bSks and selecting a current block from the current frame. Further, the method may include calculating 
arrt<^ ng in a tL^^^^^^^ convolution of the current block and the reference frame and selecting for the curren 

blcx:k aTe? oi Reference blocks from the reference frame. Then, a reference block, may be searched fo from the se 
o Tel^nce b ocks such that the desired reference block is identified as having a mean square 
b iVtha" is no larger than a mean square error between the current block and any other reference block of the set of 
rSrence blocks. The set of reference blocks may include each block that is withm the reference frame. 

The method may further include locating from the table, for a plurality of reference blocks, the inner product of the 
reference bSZ he current block. Also for each reference bkx:k of the plurality of reference blc«ks the squared 
norr^onheTe^Jrence block may be calculated. The sum of the squared norm and the negative of twice the inner 
pr^uc Z ato be ^^^^ Then, a reference block from the plurality of reference ^^^^^ ^'^^^.^'^ 

hLVa sum iat is no larger than the sum obtained for any other reference block of the plurality o reference btocks^ 
The LuaTed norm may be calculated by computing a running sum of squares within the reference frame. ^-^^^^ 
runn^gTm of squares may be obtained by dividing the reference frame into overlapping rows of P«^'«;^«"J"^"^^^^^^ 
squared norms oUhe pixels in each column of the rows using, for each row beyond "J'/^'l^^^j^^^^^^^^ 
row to aid in the computation; and summing the sums of the columns in each row to find the sum of squares of each 
e^ ence Sock in the reference frame using, for each reference block beyond the first, the sums from previous refer- 
Lnce biSks to aid in thL cor^putation. FinalV the method may also include utilizing a fast numerical method to compute 
meco^'oStSnof thrcurren?b.ockandthe^^^^ 

biLk and the reference frame do not completely overlap, and arranging the remaining terms from the convolution in 

*^''"ln anther embodiment of the present invention a method of obtaining motion vectors in a MPEG encoder is pro- 
vided rhrmlth^sl^^udes selecting a P-frame or l-f rame from a group of pictures to be the reference frame and 
selecting a Srame or P-frame from the group of pictures to be the current frame. Then, the current frame and the 
rlrince frTme Z be partitioned into blocks and a current block from the current frame ^'^"^^^ 
conZtfon oTthe current block in the reference frame may be calculated and stored in a table. The method *urth^^ 
includes selecting for the current block a set of reference blocks from the reference frame and searching t° a ref «^"^e 
blik from the set of reference blocks having a mean square error with the current block that is no larger than a mean 
sQuare error between the current block and any other reference block of the set of blocks. ^ . , . , 

Tothe embedment of the present invention includes a data compression method of finding a desired data block 
from^ti Of daJa bl^ks within a frame of image data that has a minimum mean square P;^^^^^^^^^ 
fixed data block. The method includes calculating and storing in a table a ''^1'^^'^^^^°^^° '^^^^^^^ 
with the frame of image data and locating from the table the inner product of each data block of the set of data blocks 
anS the S dSabSk Further, the method includes calculating for each data block of the set of data blocks the sum 

ie s^-Tnorof each data block and the negative of twice the inner product. Final^^ the memod^c^^^^^^^^ 
selecting the desired data block from the set of data blocks for whteh the sum is no larger than the sum obtained for 

Zl'^r^^^^^^^^o. the present invention, a method tor compressing -^ing image data u^^^^^^ 
estimaXon vectors is provided. This method includes provWing a reference image frame which may be divided mto a 
ofurS of candidate sub-images. As used herein, a sub-image may be the entire image or some lessor portio,^ of the 
S^Fur^her^^^^^^^^^^^^ includes providing a current image frame which is divided « orthTcSent 
s^b images. The method also includes selecting a reference frame sub-Image that corresponds t° ° 
frame sub-images by identifying at least one candidate sub-image that provides a minimum mean square error between 
af^ai one ofThe candidal sub-images and at least one of the current frame sub-images. The mean square error 
irb^detelned by perfomiingacalculationcomprisingarunningsumof squares andaco^^^^^ 
majj be a truncated convolution of the current block and the reference frame. Furthermore, the minimum mean square 
error may be calculated over a predetermined search region. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a typical MPEG encoder/decoder. 
Figure 2 shows the geometry of a block matching search scheme. 
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Figure 3 presents a table comparing the computational complexity of three different FSBMA methods. 
Figure 4 shows, for the one-dimensional case, how the candidate blocks are formed by sliding the search window 
past the current block. 

Figure 5 shows the truncated convolution sum alignments for the one-dimensional case. 

Figure 6 shows the computational complexity of two-dimensional convolution per macroblock for seven different 
methods of two-dimensional convolution. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention utilizes a blockmatching motion estimation method using mean-square error (MSE) as the 
matching criterion. IVIaking use of fast methods for multidimensional convolutions, a new method for computing motion 
estimates with the MSE criterion has been developed that overcomes that method's daunting computational complexity 
Thus, improved motion estimation in a worst-case run time may be obtained that is comparable to the average run 
time of statistical speedup methods and to the run time achieved by sparse search methods that sacrifice estimation 
accuracy. 

The method of the present invention may be used with the IVIPEG video compression standard An overoiew an 
apparatus for use with this standard, which is more fully described in the prior art. is given in Figure 1 The source 
image 19 enters the motion estimator 1 and the frame memory 2 of an encoder 17. The motion estimator 1 produces 
motion vectors based on the source image 1 9 and a previous image stored in frame memory 2 (prior to its replacement 
by source image 1 9). These motion vectors are sent to the motfon compensator 3 and the variable length coder (VLC) 
9. The prediction error between. the source image 19 and the output of the motion compensator 3 is subjected to a 
discrete cosine transform (DCT) encoder 5. The resulting coefficients are sent to a quantizer 7 and the resulting quan- 
tized and encoded displaced frame difference (DFD) is sent to the variable length coder 9 (the output of this quantizer 
IS also used to form the reconstructed image that in turn is used to calculate the prediction error) The output from the 
VLC 9 IS sent through a buffer 10 to the decoder 18. The decoder 18 includes a buffer 11 and decoder bkxks 12 13 
and 14. The input to the decoder 18 is decoded to reconstruct the motion vectors 21 and the reconstructed image 20 
which IS stored in frame memory 16 and used by the motion compensator 15 to form the next reconstructed image 

Although the present invention is not restricted to the two-dimensional case, its use with the MPEG compressfon 
standard would generally occur in a two-dimensional setting. Before considering the two-dimensional case however 
Its use in a one-dimensional setting will be described. Based upon this discussion it will become clear how the method 
may be applied to higher dimensions. 

The One-Dimensional Case 

Consider a one-dimensional MSSD problem in which one seeks a vector u from a set of candidate vectors within 
a reference vector in such a way that the mean square error between u and some current vector v is minimized In the 
reference vector, define a search window of length W=M->- 2P centered about the zero displacement pivot (ZDP) 
where M is the length of vectors u and v and P is the maximum trial offset (in either direction) from the ZDP for a 
candidate vector considered in the search. 

Consider the blockmatching process applied to some vector v and a set of candidate vectors obtained by sliding 
a window of length Wover a search window of length W. Figure 4 depicts the situation when /W = 4 and P = 2 where 
the pixels (denoted by v,) of the vector v are shown at a fixed location in the first row while the remaining rows depict 
the pixels (denoted by of the search window shifted left or right corresponding to the value of p in the left column 
On any given row, the portion of the search window falling within the sliding window (i. e., the displaced pixels of the 
search window that overlap the extant (or support) of v) represents the candidate vector u used for the MC prediction 
of V given that particular offset from the ZDP. 

A candidate vector is given by any A/consecutive values within the search window R=: {r • /= 0 1 IV- 1} That 
is, a candidate vector Up has the fomi ' ■ • •• 



where the trial offset value p is an integer such that -P < p < P. For example, an offset value of p = 0 corresponds to 
the candidate whose pivot is equal to the ZDP In this case, the sliding window is centered within the search window 
and the candidate under consideration is = [r^. r^. (see Figure 4, middle row). 

The blockmatching process consists of computing the cost, such as the sum of squared differences (SSD) cor- 
responding to each of the 2P+ 1 candidate blocks, and then choosing a candidate with the smallest cost To simplify 
notation, we will (as indicated by Figure 4) re-index the search window so that its ZDP coincides with the leftmost pixel 
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Of R in that case, all trial offsets ff are nonnegative integers in the range 0 < p" < 2P. The tme displacement p relatK/e 

"\TsiC)d\ret\?rjrsq^^^^^ 

(re-indexed) offset p' where p'= 0, 1, .... 2P. Note that 

ssD(p') = t^^' = 2'-^' -2i:v,r,.,.. 



That IS SSDlp) is a sum of three terms. The first term, (the squared norm of the desired vector, wh.ch .s denoted 
L Mpi doei no depend on the offset p/. The second term is the squared norm of the candidate vector ^wh^^ 
Ts der^oted by l|u, ^ The third term in the sum is the cross correlation function of the des.red vector v and the ^nd.d^e 
vector V ThKird temi is very similar to a convolution sum. where we recall that the convolution of two sequences 
x[n] and h[n] is denoted by x * y and is defined by 

(x*y)[n]= X<*M«-*]- 

Thus the value of the third term in the previous expression for SSD(pO is given by one term in ^Je c°nvol"t^^^^^^ 
, ■ n^ljil and IB • 0 < /c < m Note that, as in the convolution sum, we have reversed one of the sequences. 
^par^cularTn/d^^^^^ .^and let «'= {HV- 0 . W)- Further, let y denote the convolution of v 

and R'. That is, for 0 < y < M, 



t-0 



consider the case when /W= 4 and P= 2. The alignments of the sequences used to obtain the convo^^^n yU for 
each Srt cular value of ; are shown in Figure 5. Although the convolution yields an output sequence o^^^f J^t J* 
1 we are imLrested only in the terms of y\j\ such that < /< W-^ . (That is. for a search window of size /W + 2P. 
wealonryrreSedX^^^^^^ 

Tc^IyS^nhe eleven nonzero values in theconvolution sum correspondtooff^^^^ 

yiai. y[4l yl7]. 

The IVvGk-Dimensional Case 

The analysis of the one-dimensional situation can be extended to two dimensions In a straightforward manner. In 
partll^faMn Two dimensions, the search window R is a block of size (M . 2P) x (A/ . 20), where the desired block v 
IT^^ M X N. and P and O are the maximum offsets in each direction. (See Figure 2.) For the two-dimensional 
situation, we seek to minimize 



-1 /t-i 



bv choice of the offset (p,q), where u denotes pixel intensities In the reference frame and v denotes pixel mtensrties .n 
fhe cu^'ent rame AS before, the search is restricted to offsets that provide complete overlap be ween the current block 
and the traSed reference block. A match is said to exist at offset (p,q) if the value of SSD at (p,q) is not terger than 
The ruHSTa; an^^^^^^^^ candkJate block in the sea«:h regton. Again, we note that SSD is a sum of three temis. 
In particular. 
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As before, note that the first term does nor depend on the offset {p,q). Also, note that the third term in the summation 
IS the cross correlation function of the current block with the reference block evaluated at (p.q). In the classical problem 
of teniplate matching, the second term in the summation is assumed to vary rather slowly over the reference image 
and thus IS regarded as a constant function of the offset. With that assumption, the matching problem is solved bv 
maximizing the cross correlation function. That is, under this assumption we recover the MCC method considered 
earner. In contrast, the MSSD criterion does nof assume that this second term is constant. In particular, the MSSD 
criterion seeks an offset (p,q) that minimizes 



M S 



M N 



As with the one-dimensional case, computing a two-dimensional convolution of the current block with a reversed version 
of the search window will yield the term 

M N 

mm 1 »a I 

for all possible offsets (p.q) of interest. 
Fast MSSD Techniques 

««.if LtT ^t""!' li^"^ ^ "^'""^ computation of a sum of squares and a cross correlation for 

oaSmn'n.H nf T '^^^'^ *° °' " *° ^"ho"gh the current frame is 

hJn w.r.H^T"? ? ^"^'^^ ^" ^^"^''^^^^ '^"^ ^e'^^^H'^^ frame at once rathe 

than for each individualsearchwindowseparately We willfirst consider an efficient technique for computingthisrunni^ 
sum of squares, and then consider methods for computing the cross correlation term 

in.«nt^''rtr^^' °* I" ""^^^ "^'"^ °' macrobtock is MX A/, and f^^m.n) is the 

intensity of the pixel with coorelinate (m.n) in the reference frame. We begin by dividing the entire imagefnto (L - A/ + 

of al blSks"in me lH^e^^ ^^"^ containing N rows of pixels. The following procedure yields the sum of squares 
Step 1. Compute the ^ nomi of each pixel in the image frame in KL multiplications. 

StBp2. Forthefirstrowstrip.savethesumofeachcolumnasC„.C2, C^v This step requires /C(/V-1) additions. 

Steps. For the second row strip, the sums C^^ may be found by noting that = - ir^/,1)|2 , 

(/.A/ + 1)|2 This operation requires 2K additions. Find the column sums for the remaining row strips in a 
similar manner, for a total of 2K(L - N) additions for row strips 2 through L - W + 1 . 

Step 4. For the first block in the first row strip, the sum of squares is given by SO„ = C„ + C,, + ... + in M- 1 
adds. The sum of squares for the second block in the row is SQ^ = SO„ - q, . C,^\,, , and similarly for 
TIVk ^i^'frSHt^^^T!! = SO^^^, . C^^^, ^ c^,. Each row strip requires (AM) 

+ 2(/C-/M)addmons. Thus, this step requires (L-/V+1)((M.1);2(K--/M)) additions. 

bTo^rnlh'eTeretnl^^^^^^ °' '^""'^ "^^^ ' ' ^° ^'^^"^ °' ail 
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T= K(N'■^)+2K{L'N)+(L-N+^){(M-^)+2{K-M)) 



= 4KL- K(3N'^y^M+^)+{N'^){M+^). 



This computation can be viewed as a search overhead for the current frame. The total number of macroblocks to be 
predicted in the entire image frame is S= (K/M) (UN). The overhead per macroblock (T/B) is then MA/multlplications and 



additions, or assuming square geometries with K= L and M= N, the overhead per macroblock is multiplications and 



additions. Since /Cand Lare typically much larger than /Wand N, the overhead is roughly five ope rat ions per macroblock 
pixel. Considering that each block matching evaluation takes on the order of 2MA/operations, the computation overhead 
for each macroblock is less than that required for direct evaluation of the cost function at three search locations. 

The fast MSSD technique consists of computing the sum of squares overhead once per reference frame and a 
single two-dimensional truncated convolution for each macroblock. It is this truncated convolution term that we will 
consider next. To compute the truncated convolution we can employ fast linear convolution methods and discard the 
"tails" at the four edges of the output array, retaining only the terms from columns M-1 through MA1 and rows A/-1 
through H-^ . We can also use fast cyclic convolution techniques and discard the first Af-1 columns and rows (the 
"wrap-around" terms) of the output. Alternatively, provisions can be made in the implementation of the convolution 
method to ensure that only the necessary terms of the truncated convolution are computed, allowing even greater 
savings in computation over techniques for general convolution. 

Figure 6 shows the computational complexity per 16 x 16 macroblock for seven two-dimensional convolution 
methods using several window sizes typical for MPEG motion estimation. The present invention may be utilized with 
any of a variety of convolution techniques. Numerous convolution techniques are known in the art. For example, Fast 
Algorithms for Digital Signal Processing by R. Blahut (Addison-Wesley, Reading. MA, 1985) and "'Split radix' FFT 
algorithm" by R Duhamel and H. Hollmann (Electronics Letters, Vol. 20, No. 1, Jan. 1984, pp. 14-16) disclose various 
convolution techniques, the disclosures of which are incorporated herein by reference. We assume in Figure 6 that 
complex multiplications are computed using three real multiplications and three real additions. The total operation 
counts for frequency-domain methods are greater than those cited in most texts. This discrepancy occurs because 
published figures generally assume that one sequence is a fixed filter, and so its Fourier transfomn can be precomputed. 
Thus, only one forward transform (on the data sequence) is needed, followed by the polntwise products and one inverse 
transform. For our motion estimation problem, the "filter" is the current macroblock to be predicted, and so is fixed for 
only one convolution. Thus, the operation counts in Figure 6 reflect the complexity of computing three Fourier transforms 
(two forward and one inverse) along with the pointwise products. Also, although not considered in Figure 6, the addi- 
tional overhead for the running sum of squares must be taken into account as well. 

With the present invention, blockmatching using a mean square mismatch function becomes a viable and compu- 
tationally attractive motion estimation technique. The computational complexity using the fast MSSD method for N X 
A/ macroblocks within a search window of size Wx Wis less than an order of 3 l/l/^log IV multiplications and 91/V^loglV 
additions. For comparison, exhaustive sequential search using MSSD requires on the order of A^IA^ multiplications 
and 2A^IA^ additions, while MSAD requires on the order of Zt^Vfi- additions. Using the split-radix FFT, the present 
invention achieves a 78% reduction in the total number of operations as compared to full-search MSAD when A/ = 1 6 
and \N- 32. while also yielding the better quality of prediction than that provided by MAE. For 14/'= 64. the total com- 
putation is reduced by 88%. Further, these reductions in computational complexity are fixed and independent of the 
data, unlike the statistical improvements offered by other methods, and the search for the best match Is exhaustive, 
unlike other methods that reduce computation at the expense of accuracy by searching only a sparse subset of pixels 
and/or candidate offsets within the search space. The present invention will enable the Implementation of MPEG video 
encoders with optimal real-time motion estimation capability. 

Further modifications and alternative embodiments of this invention will be apparent to those skilled in the art in 
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view of this description. Accordingly, this description is to be construed as illustrative only and is for the purpose of 
teaching those skilled in the art the manner of carrying out this Invention. It Is to be understood that forms of the Invention 
herein shown and described are to be taken as illustrative embodiments. Various changes may be made in the type 
and arrangement of the methods, parts and components. Equivalent elements may be substituted for those illustrated 
and described herein, and certain features of the invention may be utilized independently of the use of other features 
all as would be apparent to one skilled in the art after having the benefit of this description of this invention 



Claims 

1 . A method of obtaining motion estimation vectors for a block-based video compression scheme comprising: 

partitioning a current frame and a reference frame into blocks; 
selecting a current block from said current frame; 

calculating a convolution of said current block and at least a portion of said reference frame- 
selecting for said current block a set of reference blocks from said portion of said reference frame and 
searching for a reference block, from said set of reference blocks, having a mean square error with said current 
block that IS no larger than a mean square error between said current block and any other reference block of 
said set of reference blocks. 

2. The method of claim 1 , further comprising data from said calculating step being utilized in said searching step. 

3. The method of claim 2 in which said set of reference blocks includes each block within said reference frame. 

4. The method of claim 2 in which said data is stored In a tabular form. 

5. The method of claim 2, said searching step further comprising; 

obtaining from said convolution calculation, for a plurality of reference blocks from said set of reference blocks 
the inner product of said reference blocks and said current block; 

calculating, for each reference block of said plurality of reference blocks, the squared norm of said reference 
block; 

calculating, for each reference block of said plurality of reference blocks, the sum of said squared norm and 
the negative of twice said inner product; and 

selecting a reference block from said plurality of reference blocks for which said sum is no larger than the sum 
obtained for any other reference block of said plurality of reference bk)cks. 

6. The method of claim 5 in which said plurality of reference blocks includes each reference block from said set of 
reference blocks. 

7. The method of claim 5, said calculating sum of squared norm step further comprising. 

computing a running sum of squares within said reference frame. 

8. The method of claim 7, said computing running sum of squares step comprising: 

dividing said reference frame into overlapping rows of pixels; 

summing the squared norms of the pixels in each column, of said rows using, for each row beyond the first, 
the sums from the previous row to aid in the computation; and 

summing the sums of said columns in each row to find the sum of squares of each reference block in said 
reference frame using, for each reference block beyond the first, the sums from previous reference blocks to 
aid in the computation. 

9. The method of claim 2, said calculating step further comprising: 

calculating a truncated convolution of said current block and said portion of said reference frame. 

10. The method of claim 2, said calculating step further comprising: 

discarding temris in said convolution for which said current block and said portion of said reference frame do 
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not completely overlap; and 

utilizing the remaining terms from said convolution to aid said searching step. 

11. A method ot obtaining motion vectors in an MPEG encoder comprising: 

5 

selecting a P-frame or l-frame trom a group of pictures to be the reference frame; 
selecting a B-frame or P-frame from said group of pictures to be the current frame; 
partitioning said current frame and said reference frame from said group of pictures into blocks; 
selecting a current block from said current frame; 
10 calculating a convolution of said current block and at least a portion of said reference frame; 

selecting for said current block a set of reference blocks from said portion of said reference frame; and 
searching for a reference block, from said set of reference blocks, having a mean square error with said current 
block that is no larger than a mean square error between said current block and of any other reference block 
of said set of reference blocks. 

IS 

12. The method of claim 11 , further comprising data from said calculating step being utilized in said searching step. 

13. The method of claim 1 2 in which said set of reference blocks includes each block within said reference frame. 
20 14. The method of claim 1 2 in which said data is stored in a tabular form.. 

15. The method of claim 12, said searching step further comprising: 

obtaining from said convolution calculation, for a plurality of reference blocks from said set of reference blocks, 
25 the inner product of said reference blocks and said current block; 

calculating, for each reference block from said plurality of reference blocks, the squared norm of said reference 
block; 

calculating, for each reference block from said plurality of reference blocks, the sum of said squared norm and 
the negative of twice said inner product; and 
30 selecting a reference block, from said plurality of reference blocks, for which said sum is no larger than the 

sum obtained for any other reference block from said plurality of reference blocks. 

16. The method of claim 15, in which said plurality of reference blocks includes each block from said set of reference 
blocks. 

35 

17. The method of claim 12, said calculating and storing step further comprising: 

discarding terms in said convolution for which said current block and said reference frame do not completely 
overlap; and 

40 utilizing the remaining terms from said convolution to aid said searching step. 

18. The method of claim 12, said calculating step further comprising: 

calculating a truncated convolution of said current block and said portion of said reference frame. 

45 19, A data compression method of finding a first sub-image of a first image frame, said first sub-image corresponding 
to a second sub-image in a second image frame, said method comprising: 

calculating a convolution of said second sub-image with at least a portion of said first image frame; 
obtaining from said calculating step the inner product of a plurality of first image frame sub-images and said 
so second sub-Image; 

calculating, for each of said plurality of first frame sub-images, the sum of the squared norm of said plurality 
of first frame sub-images and the negative of twice said inner product; and 

selecting said first sub-image to be the sub-image for which said sum is no larger than the sum obtained for 
any other of said plurality of first frame sub-images. 

55 

20. The method of claim 19, said calculating a convolution step further comprising: 

discarding terms in said convolution for which said second sub-image and said plurality of first frame sub- 
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images do not completely overlap; and 

utilizing the remaining terms from said convolution to aid said searching step. 

21 . The method of claim 1 9, said calculating a convolution step further comprising: 

calculating a truncated convolution. 

22. A method for compressing moving image data using motion estimation vectors comprising: 

providing a reference image frame, said reference Image frame being divided into a plurality of sub-images- 
providing a current image frame, said current image frame being divided into a plurality of current frarr^e sub- 
images; 

selecting for at least one of said current frame sub-images a corresponding reference frame sub-image by 
Identifying at least one reference frame sub-image that provides a minimum mean square error between cur- 
rent frame sub-images and said current frame sub-images; and 

determining said mean square error by performing a calculation comprising a running sum of squares and a 
convolution. 

23. The method of claim 22. said convolution being a truncated convolution 

24. The method of claim 22 wherein said minimuim Is determined over a search region. 

25. The method of claim 22. said convolution being the convolution of said current frame sub-image and said plurality 
of reference frame sub-images. / 

26. The method of claim 25, further comprising: 

discarding terms in said convolution for which said current frame sub-image and said plurality of reference 
frame sub-images do not completely overlap; and 
utilizing the remaing terms in said determining step. 

27. The method of claim 22. said running sum of squares being the running sum of squares of said plurality of reference 
frame sub-images. 

28. The method of claim 27. said convolution being the convolution of said current frame sub-image and said plurality 
of reference frame sub-images r- / 
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Figure 6 
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Quandalle 
FFT 


4.682 m 
41.406 a 


7.876 m 
92.546 a 


11.396 m 
92,220 a 


26.422 m 
313,322 a 
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